The aim of this project was to evaluate whether or not geotagged social media data can be useful in providing insight into a region’s “Sense of Place” using Santa Barbara as a case study.
Sense of Place can be defined as the connection people feel to their geographic surroundings, including both the natural and built environment. Locations with a strong sense of place often have a strong identity felt by both locals and visitors.
Not surprisingly, tourists and locals both tweet about nature. Tourists tweet about nature more (X%), but stick to the popular tourist sites in town including the wharf, waterfront, zoo, santa barbara bowl and more. Santa barbara locals are also found at these sites just not as in high a proportion. Natural areas that are further from the downtown
There is significant overlap in tourist and local patterns within the downtown area, indicating that tourists and locals alike share a fondness for the same areas and things.
The easy answer - I live here! Since I know the city and surrounding areas rather well, I could quickly look at spatial patterns and understand what exists at that location. The total number of tweets coming from Santa Barbara is also manageable compared to a much larger urban city.
Also, Santa Barbara is known for being a tourist town, and having beautiful natural and built landscapes (ok - I might be a bit biased here). Santa Barbara sits between the mountains and the ocean just 1.5 hours north of LA and has excellent recreation, dining, entertainment options. It’s no surprise that a lot of UCSB students end up sticking around after graduation, myself included 🙋♀️.
Going into this project, I thought that twitter data would be easily accessibly based on the number of different projects I had been seeing that used Twitter data and related R packages. But I quickly learned that this was not the case and Twitter only allows free public access to past 9 days of tweets. This was a problem since we wanted all tweets from January 1, 2015 - December 31, 2019.
Twitter data was obtained freely through an established partnership between UCSB Library and Crimson Hexagon. Before downloading, the data was queried to meet the following conditions:
Crimson Hexagon only allows 10,000 randomly selected tweets to be exported, manually, at a time in .xls format. Due to this restriction, data was manually downloaded for every 2 days in order to capture all tweets (😓). This took a significant amount of point and click time as you can imagine!
Once downloaded, the twitter data did not contain all desired information, including whether or not the tweet was geotagged which was vital to this project. To get this information I stepped outside of my R comfort zone and used the python twarc library. This library can be used to “rehydrate” twitter data using individual tweet ids, and then store all associated tweet information as .json files. From here I was able to remove all tweets that did not have a geotag, giving a total of 79,981 tweets.
Here is a sample of the tweet data:
| Month | Day | Time | Year | full_text | user_location | retweet_count | favorite_count | month_num | date |
|---|---|---|---|---|---|---|---|---|---|
| Jan | 31 | 01:38:54 | 2016 | Is the #greenflash at #sunset a myth? Sounds good anyway, 38 seconds of joy. @ Arroyo Burro… https://t.co/Lfd9xEfuyS | Santa Barbara | 0 | 0 | 1 | 2016-01-31 |
| Nov | 11 | 22:43:49 | 2016 | My Days: 11/10/2016: ➡" PT11 #HappyBirthday #ButterflyBeach "⬅️ #Scorpio #Horoscopo #Magic #1110… https://t.co/EVOKEqVWTM | Cerritos, CA | 0 | 0 | 11 | 2016-11-11 |
| Apr | 14 | 01:49:27 | 2016 | I’m going to live within this walled garden, and I can’t wait! #santabarbara @ Santa Barbara,… https://t.co/o16lFfs0Mr | Los Angeles, CA | 0 | 3 | 4 | 2016-04-14 |
| Jul | 7 | 14:46:08 | 2015 | Bulleit the wonder #corgi. Seven months of awesome. http://t.co/lhHAVpy5b8 | Bend, OR | 1 | 1 | 7 | 2015-07-07 |
| Aug | 23 | 00:03:50 | 2018 | PAL band practicing for Palpalooza this Saturday #palpalooza #sbpal #palband @ Santa Barbara Police Activities League https://t.co/l1eoQMVP4L | Santa Barbara, CA | 1 | 0 | 8 | 2018-08-23 |
| Jan | 10 | 06:43:33 | 2016 | This place is cool… Also, free beer. - Drinking a Tecate by @cuamocmx @ Carriage Museum — https://t.co/WDIWAgbhpW #photo | iPhone: 34.227627,-119.182358 | 0 | 0 | 1 | 2016-01-10 |
| Apr | 4 | 16:54:24 | 2017 | Just posted a photo @ Santa Barbara Waterfront https://t.co/eI2vaWet1n | Eugene, OR | 0 | 0 | 4 | 2017-04-04 |
Almost immediately after plotting tweets over time you can see that the total number of geotagged tweets is going down over time. Most noticeably, there is a significant drop in tweets at the end of April, 2015. It seems this is due “a change in Twitter’s ‘post Tweet’ user-interface design results in fewer Tweets being geo-tagged” ( source). The first 4 months of 2015 have 15,720 tweets, or roughly 19% of all tweets. To reduce a skew in the data and remove geotagged tweets that may have been geotagged without knowledge by the user in those months, I moved forward with all tweets from May 1, 2015 through the end of 2019.
The spatial distribution of tweets highlights areas of higher population density and tourist areas in downtown Santa Barbara.
There is a single coordinate that has over 11,000 tweets reported across all years. It is near De La Vina between Islay and Valerio. There is nothing remarkable about this site so I assume it is the default coordinate when people tag “Santa Barbara” generally. The coordinate is 34.4258, -119.714.
As you zoom in on the map, clusters will disaggregate. You can click on blue points to see the tweet.
Each hexagon shows the log10 density of tweets in that area. The highest number of tweets in a single location is around 11,000 (deep purple hex). This includes the default Santa Barbara coordinate used for geotagging from the city of Santa Barbara without a precise location
This project aimed to understand if and how preferences differ between tourists and locals for nature-based places within the Santa Barbara area. In order to test this I needed to come up with a way to identify tourists or locals. I ended up using a two step process:
This is not fool-proof and there are definitely instances where people visit and tweet from Santa Barbara more than two months a year, especially if they are visiting family or live within a couple hours driving distance, but without more data (and time) to determine where “tourists” truly live, this will have to do.
There are 21811 tweets from tourists and 45420 tweets from locals (32% and 68%). There are 12460 unique tourists and just 1893 unique local users.
The following map shows areas that have more tweets from locals (orange) or tourists (purple). Note the values indicate the log10 of the absolute difference between number of tweets from each user group. So if a hex is purple and has a value of 2, this means there are 100 times more tweets from tourists than locals at that location.
For this project, we wanted to understand how these two groups engage with the natural environment within Santa Barbara, and whether or not patterns through time and space could be used to understand what is and is not important to people.
Ideally I would’ve used an established nature “lexicon” (definition: the vocabulary of a language, an individual speaker or group of speakers, or a subject) but my search for such a thing turned up empty. So, I created my own dictionary of 67 words that I think would qualify a tweet as being “nature-based”. These include recreational words, natural features, animals, and environmental words. I fully recognize this is a dictionary that is biased towards my view of nature-based words and tailored to best capture Santa Barbara centric tweets. I would not recommend this dictionary be used for other non-coastal California areas.
## [1] "hike" "trail" "hiking" "camping" "tent"
## [6] "climb" "summit" "fishing" "sail" "sailing"
## [11] "boat" "boating" "ship" "cruise" "cruising"
## [16] "bike" "biking" "dive" "diving" "surf"
## [21] "surfing" "paddle" "swim" "ocean" "beach"
## [26] "[^a-z]sea" "sand" "coast" "island" "wave"
## [31] "fish" "whale" "dolphin" "pacific" "crab"
## [36] "lobster" "water" "shore" "marine" "seawater"
## [41] "lagoon" "slough" "saltwater" "underwater" "tide"
## [46] "aquatic" "[^a-z]tree" "[^a-z]earth" "weather" "sunset"
## [51] "sunrise" "[^a-z]sun" "climate" "park" "wildlife"
## [56] "[^a-z]view" "habitat" "[^a-z]rock" "nature" "mountains"
## [61] "[^a-z]peak" "canyon" "pier" "wharf" "environment"
## [66] "ecosystem" "flower"
Let’s look at some examples of what tweets qualified as “nature-based”.
| date | full_text | user_location | user_type | nature_word |
|---|---|---|---|---|
| 2016-01-03 | strolling on the beach with my sweetie! @bacararesortsb @ bacara resort & spa https://t.co/mtizcusplt | Oklahoma City, OK | tourist | 1 |
| 2016-12-11 | the old people @ east beach batting cages https://t.co/bxt1o83ocg | Santa Barbara, CA | local | 1 |
| 2018-12-05 | 🍿 ‘tis the season! @shoppaseonuevo + @metrotheatres want to help you #surprise the movie 🎥 lover in your life irl! #tag 🏷 someone you would like to #give the #gift of #movies to in the… https://t.co/llyt1iertt | Santa Barbara, CA | local | 1 |
| 2016-03-15 | forced to enjoy this view during today’s dog walk. #santabarbara #travel @ san marcos foothills… https://t.co/mt7whmdrwv | Santa Barbara | local | 1 |
| 2017-05-07 | such a magical moment… moon peeking through the dark clouds to shine its light on a palm tree.… https://t.co/fggd3lolu4 | wherever the next plane lands | tourist | 1 |
| 2019-07-25 | just posted a photo @ four seasons resort the biltmore santa barbara https://t.co/m6iyewibjq | Kensington. London | tourist | 1 |
| 2016-04-26 | just posted a photo @ santa barbara beach https://t.co/1agy5nuzbm | Pleasanton, CA | tourist | 1 |
All groups show increases in proportion of tweets that are nature based over time, even as the number of geotagged tweets declines.
Not surprisingly there are less nature-based tweets than non-nature-based 24% of all geo-tagged tweets are nature-based.
Of local tweeters, 21% of tweets are nature-based. Of tourists, 30% are nature-based.
To link tweet locations to what exists at those locations we need to use a spatial dataset that tells us what is there. This could be roads, city parcel information, or in our case we are using protected areas from the California Protected Areas Database.
The CPAD is a GIS dataset depicting lands that are owned in fee and protected for open space purposes by over 1,000 public agencies or non-profit organizations.
We can look at the top 20 most popular tweeted-from sites. The green highlighted portion represents nature-based tweets. The number indicates what percentage of all tweets are nature-based at each site. Names in bold indicate over 50% of tweets are nature-based.
Going a bit further, we also looked at number of unique visitors to these CPAD sites. By calculating the proportion of unique tourists and locals that visit these sites, we start to look at who goes where. This is not limiting tweets to only those that are nature-based.
At the lower end we see more locals than tourists visiting these sites. These tend to be less popular areas. On the upper end, we see sites that are more frequented overall, and more frequented by tourists. These include well-known areas like the Santa Barbara Harbor and Stearn’s Wharf. Those on the lower end that locals frequent more are either lesser-known (Shoreline Park, Alameda Park are both neighborhood parks), or further from main tourist areas (e.g. Goleta Beach)
We can apply a sentiment analysis to the twitter data to try and understand patterns and trends in the general sentiment of tweets.
The top graph shows the total number of geotagged tweets, which has gone down over time across tourists and locals.
The bottom graph shows average daily sentiment scores over time. Above 0 is positive, below 0 is negative. We see that tweets are mostly positive and growing over time.